Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Fix memory error in threaded engine #6084

Merged
merged 1 commit into from
May 3, 2017
Merged

Fix memory error in threaded engine #6084

merged 1 commit into from
May 3, 2017

Conversation

sifmelcara
Copy link
Contributor

This bug make mxnet frequently get segfault on my machine, after some debugging and code tracing I realized it happens in the following condition:

  1. Executor calls DeleteOperator
  2. DeleteOperator pushes ThreadedOpr::Delete(threaded_opr); into the threaded engine.
  3. But the write dependency of ThreadedOpr::Delete(threaded_opr); is not satisfied because operation A is using a dependent variable of ThreadedOpr::Delete(threaded_opr);, so ThreadedOpr::Delete(threaded_opr) waits A.
  4. Operation A finishes and calls ThreadedEngine::OnComplete(A).
  5. We mark write dependencies of A as complete (these code).
  6. After step 5 (this line), the function pushed in step 2 may immediately execute, thus the object which threaded_opr points to is destroyed and the memory address may be occupied by other objects.
  7. We read threaded_opr->temporary in this line, and get a non-zero garbage value, then executes ThreadedOpr::Delete(threaded_opr); and destroy an innocent object threaded_opr points to. (Probably also a threaded_opr because object pool being used)

This patch fixes the issue.

cc original author @tqchen

@tqchen
Copy link
Member

tqchen commented May 3, 2017

Great catch!

@tqchen
Copy link
Member

tqchen commented May 3, 2017

This rare failure problem has bugged me for more than a year and I have to say you make my day! Thanks for the great observation

@hotpxl
Copy link
Contributor

hotpxl commented May 3, 2017

@GaiYu0 Does this patch fix your problem with threaded engine as well?

@sifmelcara
Copy link
Contributor Author

@tqchen You are welcome :)
I spent time to debug this because I want to use this fantastic library.

@GaiYu0
Copy link

GaiYu0 commented May 5, 2017

The fix works for me. @sifmelcara Thank you very much!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants